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Section: 7. Speech, Audio and image/video processing 
Abstract: Selectively cancelling signals at specific 
locations within an acoustical environment with 
multiple listeners is of significant importance for 
home theater, automobile , teleconferencing, office, 
industrial and other applications. We have 
proposed the eigenfilter for selectively cancelling 
signals in one direction, while attempting to 
retain them at unintentional directions. In this 
paper we investigate the behaviour of the 
performance measure (i.e., the gain) for a vowel 
and an unvoiced fricative, when the listener moves 
his head, in an automobile type environment. We 
show that in such a situation, a large energy in 
the difference between the impulse responses at a 
listeners location may affect the gain 
substantially. Preliminary results also show that 
the gain is not significantly affected by the 
variations in the type of the speech signal. 

I. Introduction 

Integrated media systems are envisioned to have a signifi- 
cant impact on the way groups of people in remote locations 
communicate with each other. One of the critical elements 
that help enhance the suspension of disbelief required to 
convince people that they are truly in the same environ- 
ment is sound. While a great deal of ongoing research has 
focused on the problem of delivering hieji quality sound to 
a single listener, the problem of delivering the appropriate 
audio signals to multiple listeners in the same environment 
has not yet been adequately addressed. 

In previous work [1], |2], [3) we focused on presenting an 
audio signal at a selected direction in a room, while simul- 
taneously minimizing the signal power at another direction. 
For example, in home theater or television viewing appli- 
cations a listener in a specific location in the room may not 
want to listen to the audio signal being transmitted, while 
another listener at a different location would prefer to lis- 
ten to the signal. Consequently, if the objective is to keep 
one listener in a region with a reduced sound pressure level, 
then one can view this problem as that of signal cancella- 
tion in the direction of that listener. Similar applications 

The authors are with the Immersive Audio Laboratory, Integrated 
Media Systems Center, University of Southern California, 3740 Mc- 
CUntock Avenue, Los Angeles, CA 90089-3564. 

This research has been funded by the Integrated Media Systems 
Center, a National Science Foundation Engineering Research Center, 
Cooperative Agreement No. EEC-9529152. 



arise in the automobile (e.g., when only the driver would 
prefer to listen to an audio signal), or any other environ- 
ment with multiple listeners in which only a subset wish to 
listen to the audio signal. 

An eigenfilter for selective signal cancellation is designed 
by optimizing an objective function as shown in Section 
2. Section 3 summarizes some properties of eigenfilters for 
stationary signals. In Section 4 we show that the perfor- 
mance function is affected for certain changes in the re- 
sponses (such as head movements) in a simulated automo- 
bile type environment. We confirm these results for simple 
speech signals, (i) an unvoiced fricative /S/ as in sat, (u) 
a vowel /AE/ as in bat Section 5, presents preliminary 
results demonstrating that the gain is not significantly af- 
fected' when an eigenfilter is designed for one type of speech 
signal, but a radically different speech signal is presented 
for cancellation. We conclude this paper in Section 6, and 
suggest some future directions. 

II. The Eigenfilter for Selective Signal 
Cancellation 

An= objective criterion is designed for maximizing the dif- 
ference in signal power between two different listener loca- 
tions that have different source-receiver response charac- 
teristics. For simplicity we assume that the listeners can 
be modeled as point receivers. The method can also be ex- 
tended to take into account ear spacing and head-related 
transfer function effects. The filter, known as the eigen- 
filter that is derived by optimizing the objective function, 
operates on the raw signal before the resulting signal is lin- 
early transformed by the room responses in the direction 
of the listeners. Such filters aim at increasing the relative 
gain in signal power between the two listeners with some 
associated tradeoffs such as: (i) spectral distortion that 
may arise from the presence of the eigenfilter, and (ii) the 
sensitivity of the filter to the length of the room impulse re- 
sponse (reverberation), (ili) perceptual coloration, and (iv) 
sensitivity to spatial variations in the room responses (due 
to listener head movements). In this paper we focus on the 
sensitivity issue in a space that has the approximate di- 
mensions of an automobile interior as an example of where 
this approach could be implemented. We also investigate 
the gain variations with changes in the excitation signal. 

A. Determination of the Eigenfilter 

Under our assumption of modeling the listeners as point 
receivers we can set up the problem as shown in Fig. 1, 
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where w k ;k = 0, 1, JW - 1 represents the coefficients of 
the finite impulse response filter to be designed. During the 
design -phase we assume that the listeners are stationary. 
The listening model is then simply 

Af-1 

Vi(n) = hi(n) ® w**(n - k) + Vi (n) i = 1 ( 2 (1) 

where ® represents the convolution operation. With this 
background, we view the signal cancellation problem as a 
gain maximization problem (between two arbitrary listen- 
ers), we can state the performance criterion as, 



J(n) = mazv -( 



2 V; 



va(n) 



(2) 



in which we would like to maximize the signal to noise ratio 
(or signal power) in the direction of listener 2, while keep- 
ing the power towards Dstener 1 constrained at 10^«/ 10 
(where = 10log 10 *). In (2), < (n) -/< (a) denotes the 
transmitted signal to ambient noise power at listener i2 ( 
with Vi {n) as denned in (1). The quantity A is the well 
known Lagrange multiplier. 

It can be easily shown, under equal ambient noise, that 
the optimal filter, af, is an eigenfilter given by 



B = 



5-15-1 

' ££^(p)M<7)R*(p,?) 

L-l L-l 
p=0 q=0 



(3) 



where, &x^„[b-*a) denotes the eigenvector corresponding 
to the maximum eigenvalue A maz of B~ l A. 
The performance is the gain G^b expressed as, 



G dB = l01og lo %^ 

Si(n) 

= 101og 1Q ~ ~ 
1U w* T Bw* 



(4) 



Fundamentally, by casting the signal cancellation prob- 
lem as a gain maximization problem, we aim at introduc- 
ing a large gain of Q dB between two listeners, R x and R 2 . 
This Q dB gain is equivalent t o virtua lly positioning lis- 
tener Ri at a distance which is VlOQ/™ times the distance 
of listener R% from a fixed sound source*. 

III. Some Properties of Eigenfilters 
A couple of interesting properties of the proposed eigen- 
filter under wide-sense stationary (WSS) assumptions are 
restated below . 

•Strictly speaking, in the free field, the gain based on the inverse 
square W, is expressed as, Q = 10 iog 10 r J /rj (dB), where n , r a are 
the radial distances of listeners R t and R 2 from the source. 



Property 1 : For a WSS processes x(n), and y{n) with 
finite variances, the matrix Rx(p,g) is toeplitz, and the 
gain (4) can be expressed as, 



10 w / 2 J^(^)l 2 |i/ 2 (e^)| 2 ^(^)^ 
10 l^-(^)P|i/i(e>-)|^ t (e^)^ 



(5) 



where, r x {k) e %(*) and S*(e*") form a fourier transform 
pair, and hi{n) and h 2 {n) are stable responses. Moreover, 
since we are focusing on real processes in this chapter, the 
matrix Rx{k) is a symmetric matrix, with 



r,(*) = r,(-*) 



(6) 



Property 2 (Linear phase) : The optimal eigenfilter (4) 
is a linear phase FIR filter having a constant phase and 
group delay, or a constant group delay. 

IV. Sensitivity to Spatial Variations of 
Listeners 

The goal in this experiment is to observe the robustness 
of the designed optimal eigenfilter to variations in room 
responses. For the present situation, we generated syn- 
thetic room responses, at the direction of the two listen- 
ers, using the image method [4] for an automobile enclo- 
sure (dimensions of 2m x 2m x 2m). The relative loca- 
tions of the source and the two listeners is shown in Fig. 
2, where a single source is assumed to be operating be- 
low and to the left of the driver (e,g M a speaker located 
on the driver side door). Listener 2 is assumed to be the 
driver, whereas listener 1 is assumed to be the passenger 
for designing the eigenfilter. The normal (nominal) posi- 
tions of the driver and passenger are denoted by an as- 
terisk. An eigenfilter was designed for these two locations 
(having different responses). A set of four responses were 
also synthesized around each of the listeners head, depict- 
ing head movements of the listeners (indicated by circles). 
Two eigenfilters were designed. The first design involved 
an unvoiced fricative /S/ as an input to the automobile 
enclosure (shown in Fig. 3), whereas the second design in- 
volved a vowel /AE/ as an input to the eigenfilter (shown 
in Fig. 4). Once the eigenfilter was determined for the 
nominal head locations, the gain (4) was obtained for the 
nominal positions, as well as for positions corresponding 
to the head variations (while keeping the eigenfilter fixed). 
Ideally, it is preferred that the gain changes are negligible 
with listener variations. The order M of the eigenfilter was 
set at 100. We are currently investigating the perceptual 
effects of filter length on sound quality and will report those 
results in the near future. 

A. Unvoiced Fricative /S/ 

The gain (dB) matrix as a function of spatial variations 
is given below. In the matrix, a gain at the i-th row and 
the j-th column provides a gain at an i-th location of the 
driver head against the j-th location of the passenger head 
around the nominal position (i = j = 1 indicates the gain 
at nominal locations of the head for which the eigenfilter 
was designed). The nominal positions of the driver and 
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passenger head are marked by an asterisk. The numbers 
in the parenthesis depict the energy in the difference be- 
tween the room responses for nominal positions and room 
responses for head variations. 





1* 
(Oft) 


(0%) 


3 

(82%) 


4 

(8.%) 


5 

(59%) 




10.8 


10.8 


-.57 


6.7 


0.5 


2 (10.9%) 


11.7 


11.7 


.37 


7.6 


1.4 


3 (30.3%) 


12 


12 


.6 


.8 


1.6 


4 (30.3%) 


12 


12 


.6 


7.8 


1.6 


5 (10.9%) 


11.7 


11.7 


.37 


7.6 


1.4 



B. Vowel /AE/ 
The gain matrix for this case is given below: 





1* 

(0%) 


(0%) 


3 

(82%) 


4 

(8.%) 


6 

(59%) 


1*(0%) 


11.3 


11.3 


-.8 


6.8 


.44 


2 (10.9%) 


12.4 


12.4 


.17 


7.8 


1.4 


3 (30.3%) 


12.7 


12.7 


.5 


8.1 


1.8 


4 (30.3%) 


12.7 


12.7 


.5 


8.1 


1.8 


5 (10.9%) 


12.4 


12.4 


.17 


7.8 


1.4 



The largest changes in the gain occur when the passen- 
ger head location varies. This is mapped in Fig. 5, which 
depicts the energy in the difference between the room re- 
sponses for nominal positions and room responses for head 
variations 1 . In summary largest changes in the gain oc- 
cur for large energy differences between room responses at 
the passengers (listener 1) head. This seems intuitive, since 
the driver's response has a dominant direct field component 
which is not substantially affected due to the closeness of 
the driver to the source. The passenger's response has dom- 
inant reflective components which vary significantly with 
variations in the head locations. 

V. Sensitivity to Varying Excitation Signals 

As can be seen from (4), the gain is affected by the vari- 
ations in the A and B matrices. One cause for the changes 
in the gain, besides due to the spatial response variations 
discussed above, is due to the differences in signals applied 
to the eigenfilter for cancellation. The differences in the sig- 
nals cause differences in the second order statistics, thereby 
affecting the gain. One goal in this preliminary experiment 
is to observe the robustness of the designed optimal eigen- 
filter to differences in the excitation signal applied to the 
eigenfilter for cancellation. For the present situation, we 
used the signals from the previous section. That is, we 
designed an eigenfilter (again M = 100) for the unvoiced 
fricative having a correlation function shown in Fig. 7(a), 
then applied the vowel (which has a radically different cor- 
relation function as shown in Fig. 7(b)) for signal can- 

*Tbe energy difference between room responses ^ and h 3 is given 



cellation, and computed the gain when the head positions 
changed. We also observed the effects on the gain when 
we combined the two signals as shown in Fig. 6, with the 
corresponding correlation function depicted in Fig. 7(c). 
The results are presented below for the vowel presentation, 
once the eigenfilter was designed for the unvoiced fricative 
(again, the normal (nominal) positions of the driver and 
passenger are denoted by an asterisk). 





1* 

(0%) 


2 

(0%) 
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(82%) 
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5 

(59%) 


1*(0%) 


10.5 


10.5 


-2.08 
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-.9 
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1.9 


3 (30.3%) 


12.52 
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-.07 


7.5 


2.8 


4 (30.3%) 


12.52 


12.52 


-.07 


7.5 


2.8 


5 (10.9%) 


11.62 


11.62 


-.9 


6.6 


1.9 



Comparing the above table with the first table in the 
Section IV, we see that the gain is not significantly affected 
due to this new presentation. The results in the above 
table do not change much even for the combined vowel- 
fricative presentation, after the eigenfilter is designed with 
the unvoiced fricative, since the correlation function (Fig. 
7(c)) is not much different than for the vowel case (Fig. 
7(b)). 

VI. Conclusions 

In this paper we investigated the robustness of the eigen- 
filter to changes in head locations of two listeners (i.e., 
changes in the responses at the two listeners) in an au- 
tomobile type enclosure for simple speech signals, in terms 
of the gain. We observed that the performance is affected 
largely due to the passenger (listener 1) head movements 
than the driver head movements. We believe that this is 
because of larger energy in the difference between room 
responses corresponding to head movements and nominal 
responses. We plan to use perturbation theory to further 
investigate and quantify this behavior. Future research will 
also be directed to more complex signals, and perceptual 
aspects of designing eigenfilters (i.e., gain-perceptual col- 
oration tradeoffs). 

We also performed a preliminary experiment into the 
robustness of the eigenfilter to varying excitation signals 
having different correlation function. We see, that the gain 
Is quite stable despite differences in the correlation func- 
tions between a newly presented speech sequence and the 
original speech sequence (on which the eigenfilter was de- 
signed). 
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Fig. 1. The source-two listener model. 

Fig. 3. Unvoiced fricative speech signal /S/ as in sat, 
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Fig. 4. Vowel /AE/ as in bat 
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Fig. 5. Energy in the difference between room responses correspond- 
ing to head movements and nominal responses. 
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Fig. 7. First 301 samples of the correlation function for (a) 
unvoiced fricative /S/ (noise-like sequence), (b) vowel /AE/ 
(quasi-periodic sequence), (c) combined fricative-vowel sequence* 
/S//AE/ 
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